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Abstract 

The identification of multivariable state space models in innovation form is solved in a subspace identification framework using 
convex nuclear norm optimization. The convex optimization approach allows to include constraints on the unknown matrices 
in the data-equation characterizing subspace identification methods, such as the lower triangular block-Toeplitz of weighting 
matrices constructed from the Markov parameters of the unknown observer. The classical use of instrumental variables to 
remove the influence of the innovation term on the data equation in subspace identification is avoided. The avoidance of the 
instrumental variable projection step has the potential to improve the accuracy of the estimated model predictions, especially 
for short data length sequences. This is illustrated using a data set from the DaSIy library. An efficient implementation in the 
framework of the Alternating Direction Method of Multipliers (ADMM) is presented that is used in the validation study. 
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1 Introduction 

The generation of Subspace IDentification (SID) meth¬ 
ods for the identification of Linear Time-Invariant (LTI) 
state space models as developed originally in CniEilEI] 
derive approximate models rather than models that are 
“optimal” with respect to a goodness of fit criterium de¬ 
fined in terms of the weighted norm of the difference be¬ 
tween the measured output and the model predicted out¬ 
put. The approximation is based on linear algebra trans¬ 
formations and factorizations of structured Hankel ma¬ 
trices constructed from the input-output data that are 
related via the so-called data equation |25| . All existing 
SID methods aim to derive a low rank matrix from which 
key subspaces, hence the name subspace identification, 
are derived. The low rank approximation is in general 
done using a Singular Value Decomposition (SVD). 


* Part of the research was done while the first author was 
a Visiting Professor at the Division of Automatic Control, 
Department of Electrical Engineering, Linkoping University, 
Sweden. This work was partially supported by the European 
Research Council Advanced Grant Agreement No. 339681. 
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A number of recent developments have been made to 
integrate the low rank approximation step in SID with 
a goodness of fit into a single multi-criteria convex op¬ 
timization problem. These contributions were inspired 
by the work in [3] to approximate a constraint on the 
rank of a matrix by minimizing its nuclear norm. It re¬ 
sulted into a number of improvements to the low rank 
approximation step over the classically used SVD in SID, 

[laiisiiiniigiTiiiiiiisiiii]. 

When considering identifying innovation state space 
models, a common approach is to make use of instru¬ 
mental variables [7] . It is well known that the projection 
operation related to the use of instrumental variables 
may result into a degradation of the accuracy of the 
estimated quantities. 

In this paper we present a new SID method for identify¬ 
ing multivariable state space models in innovation form 
within the framework of nuclear norm optimization. The 
new SID method avoids the use of instrumental vari¬ 
ables. The method is a convex relaxation of Pareto opti¬ 
mization in which structural constraints are imposed on 
the unknowns in the data equation, such as their block- 
Toeplitz matrix structure. This Pareto optimization ap- 
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proach allows to make a trade-ofT between a Prediction 
Error type of optimality criteria, that is minimizing the 
(co-)variance of the one-step ahead prediction of a lin¬ 
ear Kalman filter type observer, on one hand, and find¬ 
ing an observer of lowest complexity, i.e. of lowest model 
order, on the other hand. A key result is that the struc¬ 
tural Toeplitz constraint is sufficient to find the minimal 
observer realization when the optimal one-step ahead 
prediction of the output is known. The incentive to es¬ 
timate a Kalman filter type of observer also justifies the 
constraint to attempt to minimize the variance of the 
one-step ahead prediction error. 

It is interesting to note that this key result stipulates 
precise conditions on the persistancy of excitation of the 
input (in open-loop experiments). For many instrumen¬ 
tal variable based SID methods it is still an open ques¬ 
tion what the persistency of excitation condition is on 
a generic input sequence to guarantee the algorithm to 
work for finite data length samples or to be consistent 

m- 

The convex relaxation of the new SID approach is 
denoted by Nuclear Norm Subspace IDentification 
(N2SID). Its advantages are demonstrated in a compar¬ 
ison study on real-life data sets in the DaSIy data base, 
[^. In this comparison study N2SID demonstrated to 
yield models that lead to improved predictions over two 
existing SID methods, N4SID and the recent Nuclear 
Norm based SID methods presented in m and with the 
Prediction Error Method (PEM) [H]. For the sake of 
brevity the results about only one data set are reported. 
For more information on a more extensive experimental 
analysis leading to similar conclusions we refer to jH] . 

The foundations for N2SID were presented in [53]. There 
the resulting optimization problem was solved using a 
Semi-Definite Programming (SDP) solver after a refor¬ 
mulation of the problem into an equivalent SDP prob¬ 
lem. In addition to this the problem formulation was 
approximated in order to obtain a problem of manage¬ 
able size for current SDP solvers. In this paper we will 
instead solve the problem with the Alternating Direc¬ 
tion Method of Multipliers (ADMM). ADMM is known 
to be a good choice for solving regularized nuclear norm 
problems as the ones we are solving in this paper, m- 
In order to get an efficient implementation we have cus¬ 
tomized the computations for obtaining the coefficient 
matrix of the normal equations associated with our for¬ 
mulation using Fast Fourier Transformation (FFT) tech¬ 
niques. This is the key to obtain an efficient implemen¬ 
tation. 

The paper is organized as follows. In Section 2 the iden¬ 
tification problem for identifying a multi-variable state 
space model in a subspace context while taking a predic¬ 
tion error cost function into consideration is presented. 
The data equation and necessary preliminaries on the 
assumptions made in the analysis and description of the 


subspace identification method is presented in Section 3. 
The multi-criteria optimization problem, the analysis of 
the uniqueness of solution and its convex relaxation are 
presented in Section 4. In Section 5 we explain how to 
obtain an efficient ADMM code. The results of the per¬ 
formances are illustrated in Section 6 in a comparison 
study of N2SID with two other SID methods, N4SID and 
the recent Nuclear Norm based SID methods presented 
in m and with PEM. Finally, we end this paper with 
some concluding remarks. 

1.1 Notations 

We introduce the Matlab-like notation that for a vector 
or matrix X G '^mxn Xm:n,p:q is 

the sub-matrix of X with rows m through n and columns 
p through q. If one of the dimensions of the matrix is n, 
then 1 : n can be abbreviated as just :. Moreover, with 
Nm:-i:n,p:q IS meant the sub-matrix of X obtained by 
taking rows m through n in reverse order, where m is 
greater than or equal to n. Similar notation may be used 
for the columns. 


2 The Subspace Identification Problem 

In system identification a challenging problem is to iden¬ 
tify Linear Time Invariant (LTI) systems with multi¬ 
ple inputs and multiple outputs using short length data 
sequences. Taking process and measurement noise into 
consideration, a general state space model for LTI sys¬ 
tems can be given in so-called innovation form, |25| : 

x{k -|- 1) = Ax{k) + Bu{k) + Ke(k) 
y(k) = Cx(k) + Du{k) + e{k) 

with x{k) G IR",y(fc) G M.P,u{k) G and e{k) a zero- 
mean white noise sequence. Since we are interested in 
short data sets no requirement on consistency is included 
in the following problem formulation. 

Problem Formulation: Given the input-ouput (i/o) data 
batches {u{k), j/(fc)}^j^, with N > n and assumed to be 
retrieved from an identification experiment with a sys¬ 
tem belonging to the class of LTI systems as represented 
by (1), the problem is to determine approximate system 
matrices {At, Bt, Ct, D, Kt) that define the fi-th order 
observer of “low” complexity: 

xxik + 1 ) = ATXrik) + BxUvik) + Krivvik) — CtXt 

yv{k) = Cririk) + Duy{k) 

( 2 ) 

such that the approximated output yv{k) is “close” 
to the measured output yv{k) of the validation pair 


2 


{uv{k),yv{k)}^^^ as expressed by a small value of the 
cost function, 


1 


N„ 

^ hvik) 


fc=l 


Uk)\\l 


( 3 ) 


The quantitative notions like “low” and “close approx¬ 
imation” will be made more precise in the new N2SID 
solution toward this problem. The solution to this prob¬ 
lem is provided under 2 Assumptions. The first is listed 
here, the second at the end of Section 3. 


Assumption A.l. The pair {A,C) is observable and 


the pair (A, 


B K 


is reachable. 


A key starting point in the formulation of subspace meth¬ 
ods is the relation between structured Hankel matrices 
constructed from the i/o data. This relationship will as 
defined in be the data equation. It will be presented 
in the next section. 


3 The Data Equation, its structure and Prelim¬ 
inaries 


The Hankel matrices from the output y{k) and the in¬ 
novation e{k) are defined similarly and denoted by 
and Eg TV, respectively. The relationship between these 
Hankel matrices, that readily follows from the linear 
model equations in (5), require the definition of the fol¬ 
lowing structured matrices. First we define the extended 
observability matrix Os- 


O 


T 

s 






( 7 ) 


Second, we define a Toeplitz matrix from the quadruple 
of systems matrices {A, B, C, D} as: 


D 0 ••• 0 
CB D 0 

CA^-^B ■■■ D 


( 8 ) 


and in the same way we define a Toeplitz matrix Ty^s 
from the quadruple {A,K,CA}- Finally, let the state 
sequence be stored as: 


Let the LTI model (1) be represented in its so-called 
observer form: 

j x{k -1-1) = {A — KC)x{k) -\- {B — KD)u{k) Ky{k) 
[ y{k) = Cx{k) Du{k) e(fc) 

( 4 ) 

We will denote this model compactly as: 

f x{k -|- 1) = Ax{k) -\- Bu{k) Ky{k) 

\ y{k) = Cx{k) Du{k) -\- e{k) 

with A the observer system matrix (A — KC) and B 
equal to {B — KD). Though this property will not be 
used in the sequel, the matrix A can be assumed to be 
asymptotically stable. 

For the construction of the data equation, we store the 
measured i/o data in block-Hankel matrices. For fixed 
N assumed to be larger then the order n of the under¬ 
lying system, the definition of the number of block-rows 
fully defines the size of these Hankel matrices. Let this 
dimensioning parameter be denoted by s, then the Han¬ 
kel matrix of the input is defined as: 


Atv — 


a;(l) x{2) 


x{N — s -I- 1) 


( 9 ) 


Then the data equation compactly reads: 

Ys,N = OsXn + Tu,sUs,N + Ty^sYs,N + Es^N- ( 10 ) 

This equation is a simple linear matrix equation that 
highlights the challenges in subspace identification, 
which is to approximate from the given Hankel matrices 
Yg^N and Us^n the column space of the observability ma¬ 
trix and/or that of the state sequence of the observer (5). 

The equation is highly structured. In this paper we fo¬ 
cus on the following key structural properties about the 
unknown matrices in (10): 

(1) The matrix product OsX^ is low rank when s > n. 

(2) The matrices g and Ty g are block-Toeplitz. 

(3) The matrix Es,n is block-Hankel. 

The interesting observation is that these 3 structural 
properties can be added as constraints to the multi¬ 
criteria optimization problem considered while preserv¬ 
ing convexity. This is demonstrated in Section 4. 



u(l) 

u{2) ■ 

• u{N — s -1- 1) 


Us,N = 

u{2) 

u(3) 


■ (6) 


u{s) 

u{s 1) • 

u{N) 



The analysis in Section 4 requires the following prelimi¬ 
naries. 
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Definition 1 J25f : A signal u(k) G K™ is persistently 
exciting of order s if and only if there exists an integer 
N such that the matrix Us^n has full row rank. 












Lemma 2 /^; Consider the state space model in inno¬ 
vation form (1) and let all stochastic signals be station¬ 
ary and ergodic, let Assumption A.l be satisfied and let 
the input u{k) be quasi-stationary and persistently 

exciting of order s + n, then: 


lim — 
N—^ao N 


Xn 

Us,N 




> 0 


Assumption A.2. Consider the model (5), then there 
exists an integer N such that the compound matrix, 

Xn 

Us,N 


has full row rank. 


Corollary 3 Let the conditions stipulated in Lemma 2 
hold, and let u{k) be statistically independent from the 
innovation sequence e(€) for all k,£, then, 


4 N2SID 

4-.1 Pareto optimal Subspace Ldentification 


N—¥oo N 


Xn 

Us,N 

Y,,n 


vT ttT \rT 
^s,N 


> 0 


When assuming the optimal observer given, the quantity 
y{k) is the minimum variance prediction of the output 
and equal to y{k) — e{k). Let the Hankel matrix Y^^n be 
defined from this sequence y{k) as we defined from 
y{k). Then the data equation (10) can be reformulated 
into: 


Proof: Since e{k) is white noise, it follows that 
E[x(fc)e(t')^] = 0 (with E denoting the expectation 
operator), for £ > k. This in combination with the inde¬ 
pendency between u{k) and e{£), the white noise prop¬ 
erty of e(fc) and the ergodicity or the quasi-stationarity 
of the signals yields. 


lim — 

N—^oo X 


Xn 

Us,N 

Es,N 


vT ttT pT 
^N ^s,N ^s,N 


> 0 


( 11 ) 


Considering model (1), let the block-Toeplitz matrices 
g and Tg.s be defined as the Toeplitz matrix Tu,s in (8) 
but from the quadruples {A, B,C, D) and {A, K,C, I), 
respectively. Let Oj = 




A- 


-'S— 1 




Then we can state the following alternative data equa¬ 
tion: 


Ys,N = OsXn -I- T^^gUs,N + Te^sEs,N 


Ys,n = OsXn -\- Tu^sUs,n + Ty^sYg^N- ( 12 ) 

Let Tp^m denote the class of lower triangular block- 
Toeplitz matrices with block entries px m matrices and 
let Hp denote the class of block-Hankel matrices with 
block entries of p column vectors. Then the three key 
structural properties listed in Section 3 are taken into 
account in an optimization problem seeking a trade-off 
between the following cost functions, 

rank^^Ts^AT - QuUs,n - OyYg^N^ 

and TrE[^y(k) - j(k)^ ^y(k) - j(k)j ] 

Here E denotes the expectation operator. The matrix 
Fg^N S ?{p is the (block-) Hankel matrix approximat¬ 
ing the Hankel matrix Yg^N mrd constructed from the 
approximation of the one-step ahead prediction of the 
output denoted by j(k) in the same way Yg^N was con¬ 
structed from y(k). Further, we have the following con¬ 
straints on the matrices 0„ € 7p,m and 0y € Tp,p. 


By this data equation, we have that. 


Xn 


1 

o 

o 

1__ 


Xn 

Us,N 

= 

0/0 


Us,N 

Yg,N_ 


O T’ T 

-‘-u,s 


Eg^N 


An optimal trade-off between the above two cost func¬ 
tions is called a Pareto optimal solution. Moreover, the 
Pareto optimization problem is not tractable. For that 
purpose we will develop in the next subsection a convex 
relaxation of the cost functions. This will make it possi¬ 
ble to obtain all Pareto optimal solutions using scalar- 
ization. 


The results follows from (11) and the fact that the matrix 
Tg^s is square and invertible. □ 

Based on this result the following assumption is stipu¬ 
lated. 


Before stating this convex relaxation an analysis is made 
on the additional structure that can be imposed on the 
block-Toeplitz matrices 0„ and 0y and/or under what 
conditions their block-Toeplitz structure is sufficient to 
find a unique solution. 
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Ji..2 Additional structure in the block-Toeplitz matrices 
and Ty s 


(13) only with fixed to , let s > n and let As¬ 
sumptions A.l and A.2 he satisfied, Then, 


In this section we analyse the additional structure 
present in the block-Toeplitz matrices and ^ as 

well as the conditions under which the block-Toeplitz 
structure is sufficient to find the system matrices 
{At,Bt,Ct,D,Kt)- These conditions, not includ¬ 
ing the additional structural constraint highlighted in 
Lemma 4, is summarized in Theorem 1 of this paper. 


min rank! Fs AT - QuUsm - ©uF at ) = n 
e„Grp,„.e„Grp,p V ’ " ’ / 

Further the arguments optimizing the above optimization 
problem, denoted as ©„, 0^ are unique and equal to, 

B — T B — T 

'^u — -^u^s — -^y,s 


Lemma 4 Let s > n, then we can partition the block- 
Toeplitz matrices T^^s OLXid Ty^ ^, defined in the data equa¬ 
tion (10) as. 


Tu,n I 0 
Lhu,s—n I n 


(14) 


and likewise for the matrix Ty^g. Here the matrices 
Hu^s-n o,nd Hy^s-n can he decomposed as. 


with Tu^si Ty^s the true underlying block-Toeplitz matrices 
in the data equation{10). 

Proofi Let (z Tp^m: € f~p,pj then, 




s,N 


QuUs,N 




= Ys^N — (Fi.s + Su)Us,N — 
+ Sy)Ys^N 

— ^uUs^N ^yYs^N 


Therefore, 




c 

CA 


CA‘ 


\ A^-^B • B I A^-^K ■ ■■ k\ 


Proof: Follows by construction. 


(15) 

□ 


rank 




N — QuUs,N — QyYs.N 1 = 




rank(^e>^Xjv - 5JJs,n - ^yYs,N^ 

Application of Sylvester’s inequality [5^ and under As¬ 
sumption A.2, we further have. 


Remark 5 Lemma 4 can be used to impose an additional 
constraint on the block-Toeplitz matrices 0„ and By. Lf 
we partition these block- Toeplitz matrices conformal their 
counterparts T^^s o,nd Ty^g cis highlighted in Lemma 4, as 
follows. 


0 


U,S 


0,. 


I B 

s-n I '-’u,s-r. 


(likewise for Qy^s), then for the case s > 2n we can 
impose the following additional constraint. 


ranklF.AT - QuUs.n - ©uF.at ) = rankf 


Og 


Su Sy 


( 16 ) 

First notice that under Assumption A.l the rank of 
this matrix is n for F = 0 and <5y = 0. Since the 

rank( 


Og Su Sy 


> rank for all 6u,Sy, we have 


that n is the minimal value of the rank in (16). 


It will now be shown that this minimal value of the rank, 
can only be reached for both Su and Sy equal to zero. 


rankf 


u,s—n y,s—n 


= n 


For that purpose, let t = {U ^ I30 ^ se¬ 

quence of arbitrary matrices that define the lower trian¬ 
gular block-Toeplitz matrix A'*(t) as: 


The additional constraint highlighted in Remark 5 can 
be reformulated, as done e.g. in nnng, as a rank mini¬ 
mization constraint, that can be relaxed to a convex con¬ 
straint using the nuclear norm. However we seek to avoid 
imposing this additional constraint in order to minimize 
the number of regularization parameters. The basis here- 
fore is provided in the next Theorem. 


A«(t) = 


<1 0 ••• 0 

t2 ti : 

tg tg-i ■ ■ • ti 


£ K®PXs(m+p) 


Theorem 1 Consider the observer in (1) with x{k) € 
K" and consider the rank optimization problem in Eq. 


The columns of the compound matrix 


Su Sy 


(16) 


can always be permuted into a matrix of the form A® (t) 


5 




















and since column permutations do not change the rank 
of a matrix we have that, 


rank! 


Os Su Sy 


= rank( 


Os A%t) 


all Pareto optimal solutions of the convex reformulation 
of the N2SID problem can be obtained by solving: 

6 "Hp, ©u ,3 e Tp,m, By,s £ Tp,p llPs.iV “ ©ti,“ Qy,aYs^N 

+^Ef=i ll?/(fc)-7(fc)lli 


Now we show that the following condition 


for all values of A S [0, oo). 


(18) 


rank(^ Os A^{t) 


implies that A®(t) has to be zero. In order for the above 
rank constraint to hold we need A®(t) to be of the fol¬ 
lowing form: 


Remark 4 The method can he extended to other related 
identification problems. For example one way to consider 
the identification problem of an innovation model with 
absence of a measurable input, is to consider the following 
convex relaxed problem formulation: 


t2 


0 


0 0 
0 0 


= o. 


S Ql (J 2 


qa-l qa 


(17) 


ta ta-1 ■ ■ ■ t2 tl 

The fact that s > n, we have that rank^Os_i^ = n and 

therefore we can deduce from the first p{s — 1) rows of 
the last p -|- m columns in the expression (17) that, 


9s = 0 ^ tl = 0 


minr,,„GWp,ep,,Grp,p IITs.at - 0y,sPs.Ar|U+ 


It is well-known that the problem (18) can be recast as a 
Semi-Definite Programming (SDP) problem, [HIS], and 
hence it can be solved in polynomial time with standard 
SDP solvers. The reformulation, however, introduces ad¬ 
ditional matrix variables of dimension N x N, unless the 
problem is not further approximated using randomiza¬ 
tion techniques as in |23| . In section 5 we will present an 
alternative exact method using ADMM inspired by its 
successful application in HU. 


Using this result, and the Toeplitz structure of A®(t), we 
can in the same way conclude from the first p{s — 1) rows 
and from the columns {s — 2)(m+p) -I-1 to (s — l)(m-|-p) 
in (17) that. 


9s_i = 0 ^ ^2 = 0 etc. 


Hence there cannot be a A'^(t) with the given Toeplitz 
structure that is different from zero such that 


rank! 


Os A^t) 


= n. Hence the minimal value of the 


rank of the matrix 


Os Su Sy 


in (16) w.r.t. Su, Sy yields 


zero value of both. This concludes the proof. 


4-.3 A convex relaxation 


4.4 Calculation of the system matrices 


The convex-optimization problem (18) yields the esti¬ 
mates of the quantities Ts^jVi^u.s and 0y,s- Since the 
outcome depends on the regularization parameter A, 
let us denote these estimates as rs_Af(A), 0„,s(A) and 
01/,s (A) respectively. The determination of the system 
matrices starts with an SVD of the “low rank” approxi¬ 
mated matrix as follows: 


rs,7v(A) — 0ti,s(A)[/s,Ar — 0y,s(A)Fu,Ar = 


Unix) I * 


Sn(A) I 0 
0 I * 



( 20 ) 


A convex relaxation of the NP hard problem formulation 
in (13) will now be developed. The original problem is 
reformulated in two ways. First, the rank operator is 
substituted by the nuclear norm. The nuclear norm of 
a matrix X denoted by ||A||* is defined as the sum of 
the singular values of the matrix A. It is also known as 
the trace norm, the Ky Fan norm or the Schatten norm, 
HU. This is known to be a good approximation of the 
rank operator when it is to be minimized, 013. Second, 
the minimum variance criterium is substituted by the 

following sample average ^ Wdik) - 7(fc)ll2- 

By introducing a scalarization parameter A S [0,oo), 
which can be interpreted as a regularization parameter. 


where h is an integer denoting the h largest singular 
values and the notation * denotes a compatible matrix 
not of interest here. The selection of n is outlined in the 
algorithmic description given next. 

The algorithm requires in addition to the input-output 
data sequences the user to specify the parameter s to 
fix the number of block rows in the block-Hankel ma¬ 
trices Us^N and Ys,Af and an interval for the parameter 
A denoted by A = [Amin, Amax]- As for the implementa¬ 
tion described in HU, which we will refer to as WNNopt, 
the identification data set could be partitioned in two 
parts. The first part is referred to as the ide-1 part of 
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the identification data set and the remaining part of the 
identification data set is referred to as the ide-2 part. 
This splitting of the data set was recommended in m to 
avoid overfitting. In the N2SID algorithm three variants 
can be substituted in the algorithmic block ’compute 
Aij{Xy for j = 1,2,3. This algorithmic block performs 
the actual calculation of the one-step ahead predictor 
and the three variants are summarized after the descrip¬ 
tion of the core part of N2SID. 

N2SID algorithm: 

Grid the interval A = [AminjAmax] in N different 
points, e.g. using the Matlab 

notation A = logspace(| log(Ai„in), log(Amax), 
for i=l :L, 

Solve (18) for A = A(*) and data set ide-1. 

Compute the SVD as in (20) for A = A(i). 

Select Select the model order n form the singular val¬ 
ues in (20). This can be done manually by the user or 
automatically. Such automatic selection can be done 
as in the N4S1D implementation in m as highlighted 
in m- order the singular values in (20) in descending 
order, then select that index of the singular value that 
in logarithm is closest to the logarithmic mean of the 
maximum and minimum singular values in (20). 
Compute system matrices Kt} accord¬ 

ing to the procedure ’Compute A4j(A)’ for j = 1, 2, 3 
and A = A(z). 

Using the estimated system matrices {At, Bt, Ct, T’}, 
and the validation data in ide-2, compute the simu¬ 
lated output y(k, A) as, 

XT{k -I-1) = ArXTik) -I- BTu{k) 

y{k, X) = CTXT{k)+Vu{k) (21) 

and evaluate the cost function, 

N 

J{X) = Y.\\y{k)-y{k,ml 

i=l 


end 

Select A4j(Aopt) with Aopt given as: 

Aopt = niin J (A) 


The subsequent three ways to compute the model are 
summarized as: 


Ufi as done in classical SID methods by considering Ufi 
to be an approximation of the extended observability 
matrix Og, see e.g. |25| . 

STEP 2: With Un and the estimated matrix Ty ^ we 
exploit that the latter matrix approximates the block- 
Toeplitz matrix Ty^g to estimate the observer gain Kt 
via the solution of a standard linear least squares prob¬ 
lem. This is seen as follows. Let us assume the block 
Toeplitz matrix Ty^g be given and denoted explicitly 


as, 


T — 


0 0 
CK 0 
CAK CK 


0 0 
0 0 
0 0 


CA^-^K ■■■ CKO 

If we know the matrix Og, we can write the following 
set of equations, 


Og{l : (s - l)p, :)K = TyAp + I ■ psA ■ p) 

Let us now denote the first (s — l)p rows of the matrix 
Un{Xi) by Og-i^T and let us denote the submatrix of 
the matrix 0y^s(Ai) from rows p -I- 1 to row ps and 
from column 1 to p by Ty g{p 1 : ps, 1 : p), then we 
can estimate Kt from: 

min WOs-x^tKt - Ty,s{P + 1 : Ps, 1 : p)|p (22) 


This estimate of the observer gain is used to estimate 
the system matrix At as: 


At = At + KtCt (23) 

STEP 3: Let the approximation of the observer be de¬ 
noted as: 

XT^k -I- 1) = ATXT{k) + BTu{k) -\- KTy{k) 

y{k) = CTXT{k)+'Du{k) (24) 

Then the estimation of the pair Bt, D and the initial 
conditions of the above observer can again be done 
via a linear least squares problem as outlined in [25] 
by minimizing the RMS value of the prediction error 
y(k) — y{k) determined from the identification data in 
ide-1. The estimated input matrix Bt is then deter¬ 
mined as: 

Bt=Bt + KtD (25) 


Compute Al 2 (A): 

STEP 1: as in Compute A4i(A). 

STEP 2: Derive an estimate of the state sequence of 
the observer (24) from the SVD (20), where for the 


Compute All(A): 

STEP 1: From the SVD in (20), and the selected model 
order h, the pair At,Gt is derived from the matrix 


7 




sake of compactness again the system symbol Xrik) 
will be used, 


xt(1) xt(2) 


xt{N - s + 1) 




For our problem the linear operator consists of a sum of 
Hankel and Toeplitz operators, and we will show how 
FFT techniques can be used also for this operator. For 
more details on the AD MM algorithm see the appendix. 


Using the singular values this approximation could 
also be scaled as ^/'En{X)V^ (A). 

STEP 3: Knowledge of the estimated state sequence of 
the observer (24) turns the estimation of the system 
matrices At-, Bt, Ct, Kt and the observer inititial 
conditions into linear least squares problem. The esti¬ 
mated pair {At, Bt) can be computed from this quin¬ 
tuple as outlined in (23) and (25), respectively. 

Compute Af 3 (A): 

STEP 1 and 2: as in Compute Afi(A). 

STEP 3: With U-n and the estimated Markov param¬ 
eters in T® g we could similarly to estimating the 

Kalman gain, also estimate the pair Bt, D via a lin¬ 
ear least squares problem. The matrix Bt can be 
estimated from Bt as outlined in (25), 


5 .1 Circulant, Toeplitz and Hankel Matrices 


We define the circulant matrix operator C" 
of a vector a; e K" via 


Xi 



X 3 

X2 

X2 

Xi 

Xn 


X 3 


X2 

Xi 



Xn—1 




Xn 

Xn 

^n—1 


X2 

Xl 




(27) 


We also define the Hankel matrix operator : 


Xi X 2 ■ ■ ■ X- 


In the experiments reported in Section 6 use will be made 
of A2SID Algorithm with the model computation block 
Compute Af i(A). It turned out that the N2SID algorithm 
is much less sensitive to overparametrization compared 
as compared to WNNopt. For that reason we will use the 
whole identification data set in all steps of the N2SID al¬ 
gorithm for the experiments reported in Section 6 , i.e. 
ide-1 and ide-2 are identical and equal to the identifi¬ 
cation data set. 




X2 


(28) 


^771. ^m+n —1 


For a vectors; € ^ it holds that = 

Hankel operator is the lower 
left corner of the circulant operator where the columns 
are taken in reverse order. We also define the Toeplitz 
operator T" : —)• Qf ^ vector x G via 


5 ADMM 


^n—l 


Xi 


The problem we like to solve is exactly of the form in 
( 20 ) in [TT], i.e. 


T-{x) 


^n +1 


(29) 


min ||Al(a;) -I- Ao||* + -(a; — a)^H{x — a) (26) 


^ 2 n —1 . 


for some linear operator A{x) and some positive semidef- 
inite matrix H. In the above mentioned reference the 
linear operator is a Hankel matrix operator, and this 
structure is used to tailor the ADMM code to run effi¬ 
ciently. Essentially the key is to be able to compute the 
coefficient matrix related to the normal equations of the 
linear operator in an efficient way using FFT. This ma¬ 
trix M is defined via 

Aladj(Al(a;)) = Mx, Va; 


We realize that T'^{x) = 'Hf_”’.’!.\.i(x), i.e. a Toplitz op¬ 
erator can be obtained from a square Hankel operator 
by taking the columns in reverse order. We are finally 
interested in upper triangular Toeplitz operators with 
and without zeros on the diagonal, and we remark that 
these are easily obtained from the normal Toeplitz op- 

r n T 


erator by replacing x with 



, which will be upper 


triangular with a non-zero diagonal if a; € K" and with 
a zero diagonal if a; S 


where Aladj(0 is the adjoint operator of Al(-). Similar 
techniques have been used for Toeplitz operators in 
and are closely related to techniques for exploiting 
Toeplitz structure in linear systems of equations, [3]. 


5.2 The Fourier Transform and Hankel Matrices 

It is well-known, j3], that if we let A" G be the 

discrete Fourier transform matrix of dimension n, then 













the circulant matrix can be expressed as 


5.^ Forming the Coefficient Matrix 


C^lx) = -(J'")"diag(J'"x)J'”. (30) 

n 

From this we immediately obtain that the Hankel matrix 
can be expressed as 

= ^i/^diag(J^^a;)G (31) 

where N = n + m— 1, F = , G = F-^n-.-i-.i and 

FI = F.^n:n+m-i- This expression will make it easy for 
us to represent the adjoint of the Hankel operator. It is 
straight forward to verify that the adjoint : 

]gmxn ]gn+m-l jg gjygjj ]3y 

= ^F^diagiHZG^). 

Notice that we are abusing the operator diag(-). In case 
the argument is a vector the operator produces a diago¬ 
nal matrix with the vector on the diagonal, and in case 
the argument is a square matrix, the operator produces 
a vector with the components equal to the diagonal of 
the matrix. 


A key matrix in the ADDM algorithm is the matrix M 
defined via 

Aa,di{A{x)) = Mx, 'ix. 

We will now show how this matrix can be formed ef¬ 
ficiently using the Fast Fourier Transform (FFT). We 
partition the matrix as 

Afii Mi2 Mi3 

M = M ^2 M 22 M 23 

^13 ^23 ^33 

where the partition is done to conform with the partition 
X = {y, V, w). It is then clear that Mu is defined via 

n^X\n^^'^Hy)) = Muy, Vy 

The left hand side can be expressed as 

^F^diag (HH^diag (Fy) GG") . 

From the identity 


5.3 The Linear Operator A 


We will now present the linear operator that we are inter¬ 
ested in for the SISO case: A : x x —>• 

where 


diag {A diag (x) B) = (Aq B"^)x 

where © denotes the Hadamard product of matrices, it 
follows that 

Mu = 0 (g^)) F 


A(x) = «<=•")(§) + r'’ 

where n = iV — s -|- 1, x = {y,v,w) with y G 
V G UF w € V G and W G By 

taking V = —Us^n and W = —Ys^n we obtain the linear 
operator for N2SID. Then, v and w are the first columns 
of the Toeplitz matrices Tu^s and respectively. We 
can express A{x) in terms of Hankel operators as: 

The adjoint of this operator can be expressed in terms 
of the adjoint of the Hankel operator as 

Adj(^)= 



The efficient way to form Mu is to first compute F,G 
and F[ using an FFT algorithm, and then to form the 
matrix 

X = (HH^) © (gG^) . 

After this one should apply the inverse FFT algorithm 
to A^, and then to the transpose of the resulting matrix 
once more the inverse FFT algorithm. 

The expressions for the other blocks of the matrix M 
can be derived in a similar way, and they are given by: 

Mu = ^F^ © {GV^^)) 

^13 = © (gW^G^i,,)) 

M22 = ({HVV^H^) © ( g ^) ) 

M23 = 

M 33 = ({HWW^H^) © (^)) 
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Notice that for the last three blocks the matrices F, 
G, and H are defined via a discrete Fourier transform 
matrix of order k = 2 s — 1 . 

5.5 MIMO Systems 

So far we have only discussed SISO systems. For a gen¬ 
eral p X m system we may write the linear operator 
A : X X as: 

P 

A{x) = '^Ai{x^) 0 


for the SISO case for all i. Below are formulas given for 
sub-blocks of each of the other matrices 

© [cVf G%)) 

^13, = © (GWIG%)) 

M22jk = [{HV,V^H^) © (^) ) 

M 23 ,fc = {{HV.W^H^) © (^)) 

^33,fc = {{HW.W^H^) © (^)) 


where 




i=i 


E nj(s,s) 

i=i 




0 

^ T 


Vj 




where Vj = —C/f ^ and Wj = Hankel matrices 

defined from uj and yj, i.e. from the jth inputs and out¬ 
puts, respectively, and where is the ith basis vector. 
Hence we may interpret each term Ai(xi) as defining a 
MISO system in the sense that each predicted output can 
be written as a linear combination of all the intputs and 
outputs. If we write the adjoint variable Z in a similar 
way as Z = Zi^Ci, it follows that the adjoint oper¬ 

ator is given by Adj(^) = (Adj,i( 2 'i),..., Adj,p(^p))- 
Hence the matrix M will now be blockdiagonal with 
blocks defined from the identity 


xiadj ,i ^Ai (Xj)) — , Vx^, i — 1 , . . . , p. 


It is not difficult to realize that the operators Aa,di,i{Zi) 
will be given by 


Aadj.iiZi') — 




Hence each of the blocks Mi will have a similar struc¬ 
ture as the M matrix for the SISO system. However, 
the sub-blocks M 12 , M 13 , M 22 , M 23 and M 33 will have 
sub-blocks themselves reflecting that fact that there are 
several inputs and outputs. Mu will be the same as Mi 


It is interesting to notice that these formulas do not 
depend on index i. This means that all Mi are the same. 


6 Validation Study 

In this section we report results on numerical experi¬ 
ments using real-life data sets. We will make use of some 
representative data sets from the DalSy collection, [5]. 
For preliminary test with the new N2SID mehod based 
on academic examples, we refer to |23| . 

The numerical results reported in Subsection 6.3 were 
performed with Matlab. The implementations have been 
carried out in MATLAB R2013b running on an Intel 
Core i7 CPU M 250 2 GHz with 8 GB of RAM. 

6.1 Data selection and pre-processing 

From the DalSy collection, [2], five representative data 
sets were selected. These sets contain SISO, SIMO, 
MISO and MIMO systems. Information about the se¬ 
lected data sets is provided in Table 1. In order to 
evaluate the performances for small length data sets, 
data sets of increasing length are considered. The data 
length is indicated by Wde in Table 2 for each data set 
of Table 1 . To test the sensitivity of the identification 
mehods with respect to the length of the identification 
data set, Wde is increased from a small number, as 
compared to the total number of samples available, in 
a way as indicated in Table 2 From each identification 
and validation data set the offset is removed. Data set 2 
from the continuous stirred tank reactor is scaled in 
such a way that both outputs have about the same nu¬ 
merical range. This is achieved by scaling the detrended 
versions of these outputs such that the maximum value 
of each output equals 1 . 

Since many of the data sets contain poorly excited data 
at their beginning, the first del samples are discarded 
from each data set. The actual value of del for each data 
set is listed in Table 2. Finally each identified model is 
validated for each test case on the same validation data 
set. These validation data sets contain the Wai samples 
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Table 1 

Five benchmark problems from the DalSy collection, [2]; Ntot is the total number of data samples available 


Nr 

Data set 

Description 

Inputs 

Outputs 

Wot 

1 

96-007 

CD player arm 

2 

2 

2048 

2 

98-002 

Continuous stirring tank reactor 

1 

2 

7500 

3 

96-006 

Hair dryer 

1 

1 

1000 

4 

97-002 

Steam heat exchanger 

1 

1 

4000 

5 

96-011 

Heat flow density 

2 

1 

1680 


sample index del\ indicates 


Table 2 

The increasing length Mde of the data sets used for system identification starting with the 
the length of the validation data set starting with the sample index max{Ni^g) + 1. 


Nr 


A^ide 


1 

80 

120 

150 

175 

200 

300 

2 

100 

150 

200 

300 

400 

500 

3 

80 

100 

120 

140 

160 

180 

4 

150 

200 

300 

500 

750 

1000 

5 

175 

200 

250 

300 

350 

400 


following the sample with index max(Aiide) + l- The value 
of fVvai is listed in Table 2. 

6.2 Compared Identification methods 

Three SID methods are compared in the tests. Their key 
user selection parameters are listed in Table 3. One of 
the key user selection parameters of the SID methods is 
the number of block rows s of the Hankel data matrices. 
In methods like N4SID or WNNopt of Table 3 a distinction 
could be made in the number of block rows of so-called 
future and past Hankel matrices. Such differentiation is 
not necessary for the N2SID algorithm. Since such dif- 
ferentation is still an open research problem we opted in 
this simulation study to take the number of block rows of 
the future and past Hankel matrices in N4SID or WNNopt 
equal to the number of block rows in the N2SID algo¬ 
rithm. Table 3 also lists the interval of the regularization 
parameter A to be specified for the Nuclear Norm based 
methods. 

For N4SID we further used the default settings except 
that the Kalman filter gain is not estimated, a guaran¬ 
teed stable simulation model is identified, no input de¬ 
lays are estimated and these are fixed to zero, and finally 
no covariance estimates are determined. The order se¬ 
lection is done with N4SID using the option ’best ’, |15| . 
This results in a similar automatic choise as we have im¬ 
plemented for N2SID. 

For WNNopt in im the weighting according to the CVA 
method is used. Also here no Kalman gain is estimated 
and the input delay is set to zero. As indicated in m an 
’identification’ and a ’validation’ data set is needed to 
perform the selection of the regularization parameter A 



del 

Wal 

400 

500 

600 


120 

500 

600 

700 

800 


200 

1500 

200 

250 

300 

400 

120 

600 

1250 

1500 

1750 


200 

1500 

450 

500 

550 

600 

200 

1000 


in order to avoid overfitting. In this paper both data sets 
are retrieved from the identification data set of length 
A^ide by splitting it into two almost equal parts differing 
in length by at most one sample. 


For N2SID we used the ADMM algorithm presented in 
m, where we have provided our own routines for com¬ 
puting M as explained in section 5. As explained in m 
we also make use of simultaneous diagonalization of M 
and the positive semidefinite matrix H in (26) in or¬ 
der not to have to make different factorizations for each 
value of the regularization parameter A. The maximum 
number of iterations in the ADMM algorithm have been 
set to 200, the absolute and relative solution accuracy 
tolerances have been set to 10“®, and 10“^, respectively. 
The parameters used to update the penalty parameter 
have been set to t = 2 and p, = 10. We label our N2SID 
Algorithm with N2SID). Also we do not split the data for 
N2SID. We have also carried out experiments when we 
did split the data. This resulted in most cases in compa¬ 
rable results and in some cases even better results. 


The SID methods are compared with the prediction 
error method PEM of the matlab System Identification 
toolbox m- Here the involvement of the user in speci¬ 
fying the model structure is avoided by initializing PEM 
with the model determined by N4SID. Therefore, the 
model order of PEM is the same as that determined by 
the N4SID method. In this way no user selection param¬ 
eters are needed to be specified for PEM. This is in agree¬ 
ment with the recommendation given on the PEM help 
page http;//nf.nci.org.au/facilities/software/ 
Matlab/toolbox/ident/pem. html when identifying 
black-box state space models. 
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Table 3 


Three SID methods and their user selection parameters and the number of block-rows in the data Hankel matrices s. 


Method 

X 

S 

Weighting 

N4SID [[T5)l 

/ 

15 

automatic 

WNNopt, [[TT] 1 


15 

CVA 

N2SID Algorithm 

[10-1®, 10^] 

15 

/ 


6.3 Results and Discussion 


The three SID methods in Table 3 and the PEM method 
will be compared for the data sets in Table 1. The re¬ 
sults of this comparison are for each data set summa¬ 
rized in two graphs in the same figure. The left graph 
of the figure displays the goodness of fit criterium VAF. 
This is defined using the identified quadruple of system 
matrices [At, Bt, Ct, D] obtained with each method to 
predict the output using the validation data set. Let the 
predicted output be denoted by y«(fc) for each method, 
and let the output measurement in the validation data 
set be denoted by Then VAF is defined as: 



1 

-^val 


Wvvik) - yv{k)\\l 


100% (32) 


The right graph of the figure displays the model com¬ 
plexity as defined by the model order of the state space 
model. Both the goodness of fit and estimated model or¬ 
der are graphed versus the length of the identification 
data batch as indicated by the symbol Aide in Table 2. 


N2SID, but this a the cost of a lower VAF. For Aide = 150 
and 175, the same order as for N4SID was detected, how¬ 
ever with worser VAF as compared to both N4SID and 
PEM. For Aide > 400 the limit set on the model order, 
which was 10 in all experiments, was selected by WNNopt, 
sometimes but not always yielding a better VAF. 


The efffect of the use of instrumental variables and the 
splitting of the identification data set to avoid overfitting 
on the order selection is clear from the singular values 
of N2SID and WNNopt given in Figure 2 for Aide = 600. 
This plot visually supports the selection of a 7-th order 
model by N2SID and it also explains why WNNopt selects a 
larger model order. One possible explanation is that the 
instrumental variables and projections have “projected 
away” crucial information in the data. 


All these results are obtained in a similar “automized 
manner” for fair comparison as outlined in section 6.2. 
In order to evaluate the results additional information 
is retrieved from the singular values as computed by the 
SID methods WNNopt and N2SID. This is done in order 
to see possible improvements in the low rank detection 
by the new SID method N2SID over WNNopt. For an illus¬ 
tration of the potential improvement of the latter over 
N4SID we refer to m- 

6.3.1 The CD player arm data set (# 1 in Table 1) 

The results are summarized in Figure 1. The goodness 
of fit is given on the left side of this figure and the de¬ 
tected order h on the right side. For Aide < 400, N2SID 
outperforms all other methods and it was always bet¬ 
ter then N4SID. PEM is able to improve the results of 
N4SID in most cases. Its results remain however inferior 
to N2SID. In general N2SID detects a larger model order. 
For the shortest data lengths Aide = 80 and 120, WNNopt 
was not able to produce results since for that case the 
ADMM implementation broke down. The reason being 
that the Schur form was no longer computable as it con¬ 
tained NaN numbers. For that reason both the VAF and 
the order were put to zero. The WNNopt determined for 
150 < Aide < 300 a lower model order h compared to 


CD player arm 




Fig. 1. VAF Daisy # 1 - CD player arm. 
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Fig. 2. Singular values Daisy # 1 - CD player arm. 

6.3.2 The Continuous stirred Tank Reactor data set (# 
2 in Table 1) 

The goodness of fit parameter VAF and the estimated 
model order h are plotted in Figure 3 in the left and right 
graphs, respectively. For A^ide = 100 and 150 WNNopt was 
not able to provide numerical results. For that reason 
the corresponding VAF values are again fixed to zero. 
PEM resulted in bad VAF results for A^ide = 100, prob¬ 
ably as a consequence of bad initialization from N4SID. 
However, also for A^ide = 800 PEM had severly degraded 
results despite the fact that N4SID provided a model of 
comparable quality with the other SID methods. 

The singular values in Figure 4 indicate that for Mde = 
800 both N2SID and WNNopt have the same order esti¬ 
mate h. There is a clear gap in the singular values for 
WNNopt. The difference in detected order despite similar 
VAF indicates that order detection is not so critical for 
this example. 

6.3.3 The Hair dryer data set (# 3 in Table 1) 

The goodness of fit parameter VAF and the estimated 
model order h are plotted in Figure 5 in the left and right 
graphs, respectively. Here it is again clear that N2SID 
outperforms all other SID methods and provides more 
stable behavior when increasing A^ide compared to the 


Fig. 3. VAF Daisy # 2 - Continuous Stirred Tank Reactor. 

fluctuating behavior of the other methods, both with re¬ 
spect to VAF and estimated model order. WNNop fails to 
address the case of very small data length sets, i.e. when 
A^ide = 80 and 100. The combination of N4SID and PEM 
enables in a number of cases to provide models with a 
similar VAF compared to N2SID and in a small number 
of cases to slightly improve the results over N2SID. How¬ 
ever, this is not consistent, since for A^ide = 400 the VAF 
is worsened compared to the intialization with N4SID. 

Figure 6 diplays the singular values for the last data 
set where Mde = 400 in Figure 6. It confirms the im¬ 
proved potential in low rank approximation by N2SID 
over WNNopt. The latter method diminishes the gap, lead¬ 
ing in general to a larger model order estimation. This 
larger model order does however for this example not 
lead to a better output prediction. 

6.3.J^ The Steam Heat Exchanger data set (# 4 in Ta¬ 
ble 1) 

The goodness of fit parameter VAF and the estimated 
model order h are plotted in Figure 7. 

N2SID again for small data sets with A^ide ranging be¬ 
tween 150 and 750 yields the best output predictions of 
all methods. Comparing the VAF value in Figure 7 with 
those for the previous Daisy data sets reveals that the 
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Fig. 4. Singular values Daisy # 2 - Continuous Stirred Tank 
Reactor. 

values are smaller. This reflects problems with the data 
set due to lower signal to noise ratio, system nonlinear¬ 
ity, etc. Because of this we started the analysis with the 
smallest value of equal to 150, since for smaller val¬ 
ues poor results were obtained for all methods. 

The other methods show a similar behavior as for the 
previously analysed data sets: in most cases but not all 
PEM improves over N4SID, WNNopt displays inferior be¬ 
havior for iViUe < 1250, and both N2SID and N4SID (PEM) 
determine a smaller order then WNNopt. 

Finally, the plot of the singular values for the last data 
set in Figure 8 displays a similar behavior. Both singular 
value plots clearly support the automatic order selection 
made. However N2SID has a better trade-off between 
model complexity and model accuracy as expressed by 
the VAF. 

6.3.5 Heat flow density data set (# 5 in Table 1) 

The goodness of fit parameter VAF and the estimated 
model order h are plotted in Figure 9. 

For this data set WNNopt provides for 200 < A^ide < 450 
the best results but in general detects a larger model 
order. For the smallest length data set WNNopt produced 


Fig. 5. VAF Daisy # 3 - Hair dryer. 

inferior VAF. N2SID provides a better VAF prediction 
compared to N4SID and PEM and this for a smaller model 
order n as compared to WNNopt. 

From the singular values in Figure 10 for A^ide = 600 an 
order selection of 1 up to 4 is clearly justified by N2SID. 
The order selection made by WNNopt is much less clear. 

6.3.6 General Observation from the analysed Daisy 
data sets. 

The automized analysis of the 5 Daisy data sets clearly 
demonstrates the merit of the new SID method N2SID 
over the other representative identification methods con¬ 
sidered. Especially when considering data sets of small 
length it is able to make a good and sometimes excellent 
trade-off between model complexity and model accuracy 
as expressed by the goodness of fit. The improvement 
over the other analysed nuclear norm subspace identi¬ 
fication method WNNopt in order detection both in re¬ 
vealing a clear gap as well as in detecting models of low 
complexity is evident. 

7 Concluding Remarks 
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Subspace identification of multivariable state space in¬ 
novation models is revisited in this paper in the scope 
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Fig. 6. Singular values Daisy ?> - Hair dryer. 

of nuclear norm optimization methods and using the 
observer form. A new subspace identification method 
is presented, referred to as N2SID. N2SID is the first 
subspace identification method that addresses the iden¬ 
tification of innovation state space models without the 
use of instrumental variables (IVs). The avoidance of 
using IVs leads to a number of imp rove ments. First as 
shown in the experimental study in |24| . it leads to im¬ 
proved results in identifying innovation models when 
compared to existing SID methods, like N4SID and the 
recent Nuclear Norm based SID methods presented in 
fTT] and with the Prediction Error Method (PEM) [T5] . 
This improvement especially holds for small length data 
batches, i.e. when the number of samples is only a small 
multiple of the order of the underlying system. Second, 
as illustrated by Theorem 1, the methodology presented 
enables to provide insight on the necessary conditions of 
persistency of excitation of the input on the existance of 
a unique solution. Finally, the new N2SID methodology 
will enable to address other interesting identification 
problems in a subspace identification framework, such 
as the identification of distributed systems as shown in 

Hi nil [22]. 
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Fig. 7. VAF Daisy # 4 - Steam Heat Exchanger. 
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[27] C. Yu, M. Verhaegen, and A. Hansson. Subspace 
identification of local Id homogeneous systems. In 
IFAC Symposium SYSID. Beijing, pages 886-890, 
2015. 

Appendix: ADMM Algorithm 

We here state the ADMM algorithm for a generic nu¬ 
clear norm optimization problem with a quadratic reg¬ 
ularization term: 

minimize ||A(a:) -h Aojj* -I- ^{x — a)^H{x — a). (33) 

The presentation is an allmost exact citation from m. 
The variable is a vector x G K". The first term in the 
objective is the nuclear norm of a p x g matrix A{x) -I- Aq 
where A : K" —t is a linear mapping. The pa¬ 

rameters in the second, quadratic, term in the objective 
of (33) are a vector a G M" and a positive semidefinite 
matrix iJ G S". 

To derive the ADMM iteration we first write (33) as 

minimize ||Y||* -|- (l/2)(a: — a)'^H{x — a) 
subject to A{x) Aq = X 

with two variables x G M" and X G The aug¬ 

mented Lagrangian for this problem is 


= ll-’^ll* + ^( 2 ; - ayH{x - a) 

(34) 

+ Tt{Z^{A{x)+Ao-X)) 

(35) 

+ ^ -^( 2 ^) + ^0 — Y p. 

(36) 


where p is a positive penalty parameter. Each iteration 
of the ADMM consists of a minimization of Lp over x, a 
minimization of Lp over X, and a simple update of the 
dual variable Z. This is summarized in Table 4. 

The update in step 2 requires the solution of a linear 
equation, since Lp(x, X, Z) is quadratic in x. Setting the 
gradient of Lp{x, X, Z) with respect to x equal to zero 
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gives the equation 

{M + pH)x = Adj {pX + pAo -Z) +Ha (37) 

where ^adj is the adjoint of the mapping A and M is the 
positive semidefinite matrix defined by the identity 

M z = Aa^di{A{z)) Vz. (38) 


The minimizer X in step 4 is obtained by soft- 
thresholding the singular values of the matrix A{x) + 
^0 + -^/p- 


min{p,ij} ^ 

aigmin Lp(X, X, Z) = max{0,cri- } Uivf 

X i=i P 

(39) 

where Ui, Vi, Ui are given by a singular value decompo¬ 
sition 


A{x) Aq H —Z 
P 


min{23,g} 

i=l 


The residuals and tolerances in the stopping criterion in 
step 5 are defined as follows [T]: 

rp = A{pc)+Ad-X (40) 

^d ~ P-^adj (-^prev X) (41) 

ep = + ereimax{||^(x)||F, IAIIf, II AIIf} (42) 

Cd — V^^abs T Crel ||-4.adj (■^) || 2 ; (43) 

Typical values for the relative and absolute tolerances 
are Crei = 10“^ and Cabs = 10“®. The matrix Xpi-ev 
in (41) is the value of X in the previous iteration. 

Instead of a using a fixed penalty parameter p, one can 
vary p to improve the speed of convergence. An example 
of such a scheme is to adapt p at the end of each AD MM 
iteration as follows |H] 

{ T-p IkpIlF > p||rd||2 

p/t Ikdlk > plkpllF 

p otherwise. 


This scheme depends on parameters pL > 1, r > 1 (for 
example, p = 10 and r = 2). Note that varying p has 
an important consequence on the algorithm in Table 4. 
If p is fixed, the coefficient matrix H + pM in the equa¬ 
tion (37) that is solved in step 2 of each iteration is 
constant throughout the algorithm. Therefore only one 
costly factorization of iJ -|- pM is required. If we change 
p after step 6, a new factorization of i7 -I- pM is needed 
before returning to step 3. I is explain in mi how the 
extra cost of repeated factorizations can be avoided. 
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